|
A dictionary coder, also sometimes known as a substitution coder, is a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary') maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure. == Methods and applications == Some dictionary coders use a 'static dictionary', one whose full set of strings is determined before coding begins and does not change during the coding process. This approach is most often used when the message or set of messages to be encoded is fixed and large; for instance, an application that stores the contents of a book in the limited storage space of a PDA generally builds a static dictionary from a concordance of the text and then uses that dictionary to compress the verses. This scheme of using Huffman coding to represent indices into a concordance has been called "Huffword".〔Ian H. Witten, Alistair Moffat, and Timothy C. Bell. ''Managing Gigabytes''. New York: Van Nostrand Reinhold, 1994. ISBN 9780442018634.〕 In a related and more general method, a dictionary is built from redundancy extracted from a data environment (various input streams) which dictionary is then used statically to compress a further input stream. For example, a dictionary is built from old English texts then is used to compress a book.〔Rodney J. Smith. ''Streaming Compression System Using Dynamic Connection Groups'', US patent 5,748,955, priority date 20 December 1993.〕 More common are methods where the dictionary starts in some predetermined state but the contents change during the encoding process, based on the data that has already been encoded. Both the LZ77 and LZ78 algorithms work on this principle. In LZ77, a circular buffer called the "sliding window" holds the last ''N'' bytes of data processed. This window serves as the dictionary, effectively storing ''every'' substring that has appeared in the past ''N'' bytes as dictionary entries. Instead of a single index identifying a dictionary entry, two values are needed: the ''length'', indicating the length of the matched text, and the ''offset'' (also called the ''distance''), indicating that the match is found in the sliding window starting ''offset'' bytes before the current text. LZ78 uses a more explicit dictionary structure; at the beginning of the encoding process, the dictionary only needs to contain entries for the symbols of the alphabet used in the text to be compressed, but the indexes are numbered so as to leave spaces for many more entries. At each step of the encoding process, the longest entry in the dictionary that matches the text is found, and its index is written to the output; the combination of that entry and the character that followed it in the text is then added to the dictionary as a new entry. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Dictionary coder」の詳細全文を読む スポンサード リンク
|